DimmWitted: A Study of Main-Memory Statistical Analytics
نویسندگان
چکیده
We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine. Statistical analytics systems differ from conventional SQL-analytics in the amount and types of memory incoherence that they can tolerate. Our goal is to understand tradeoffs in accessing the data in rowor column-order and at what granularity one should share the model and data for a statistical task. We study this new tradeoff space and discover that there are tradeoffs between hardware and statistical efficiency. We argue that our tradeoff study may provide valuable information for designers of analytics engines: for each system we consider, our prototype engine can run at least one popular task at least 100× faster. We conduct our study across five architectures using popular models, including SVMs, logistic regression, Gibbs sampling, and neural networks.
منابع مشابه
Tradeoffs in Main-Memory Statistical Analytics from Impala to DimmWitted
Recent years have seen a surge in main-memory SQL-style analytic solutions to quickly deliver business critical information over massive data sets [1, 7, 14]. At the same time, there is an arms race to offer increasingly sophisticated statistical analytics inspired by the success of web search, voice recognition, and image analysis, e.g., Google Brain [8], Facebook [6], and Microsoft's Adam [2]...
متن کاملP-V-L Deep: A Big Data Analytics Solution for Now-casting in Monetary Policy
The development of new technologies has confronted the entire domain of science and industry with issues of big data's scalability as well as its integration with the purpose of forecasting analytics in its life cycle. In predictive analytics, the forecast of near-future and recent past - or in other words, the now-casting - is the continuous study of real-time events and constantly updated whe...
متن کاملBig Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions
The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...
متن کاملSQL- and Operator-centric Data Analytics in Relational Main-Memory Databases
Data volume and complexity continue to increase, as does the need for insight into data. Today, data management and data analytics are most often conducted in separate systems: database systems and dedicated analytics systems. This separation leads to timeand resource-consuming data transfer, stale data, and complex IT architectures. In this paper we show that relational main-memory database sy...
متن کاملMinuet: A Scalable Distributed Multiversion B-Tree
Data management systems have traditionally been designed to support either long-running analytics queries or short-lived transactions, but an increasing number of applications need both. For example, online games, socio-mobile apps, and e-commerce sites need to not only maintain operational state, but also analyze that data quickly to make predictions and recommendations that improve user exper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 7 شماره
صفحات -
تاریخ انتشار 2014